Bootstrapping parsers via syntactic projection across parallel texts

نویسندگان

  • Rebecca Hwa
  • Philip Resnik
  • Amy Weinberg
  • Clara I. Cabezas
  • Okan Kolak
چکیده

Broad coverage, high quality parsers are available for only a handful of languages. A prerequisite for developing broad coverage parsers for more languages is the annotation of text with the desired linguistic representations (also known as “treebanking”). However, syntactic annotation is a labor intensive and time-consuming process, and it is difficult to find linguistically annotated text in sufficient quantities. In this article, we explore using parallel text to help solving the problem of creating syntactic annotation in more languages. The central idea is to annotate the English side of a parallel corpus, project the analysis to the second language, and then train a stochastic analyzer on the resulting noisy annotations. We discuss our background assumptions, describe an initial study on the “projectability” of syntactic relations, and then present two experiments in which stochastic parsers are developed with minimal human intervention via projection from English.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring the Divergence of Dependency Structures Cross-Linguistically to Improve Syntactic Projection Algorithms

Syntactic parses can provide valuable information for many NLP tasks, such as machine translation, semantic analysis, etc. However, most of the world’s languages do not have large amounts of syntactically annotated corpora available for building parsers. Syntactic projection techniques attempt to address this issue by using parallel corpora between resource-poor and resource-rich languages, boo...

متن کامل

Learning Event Expressions via Bilingual Structure Projection

Identifying events of a specific type is a challenging task as events in texts are described in numerous and diverse ways. Aiming to resolve high complexities of event descriptions, previous work (Huang and Riloff, 2013) proposes multi-faceted event recognition and a bootstrapping method to automatically acquire both event facet phrases and event expressions from unannotated texts. However, to ...

متن کامل

Multilingual Projection for Parsing Truly Low-Resource Languages

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our...

متن کامل

Capturing divergence in dependency trees to improve syntactic projection

Obtaining syntactic parses is an important step in many NLP pipelines. However, most of the world’s languages do not have a large amount of syntactically annotated data available for building parsers. Syntactic projection techniques attempt to address this issue by using parallel corpora consisting of resource-poor and resource-rich language pairs, taking advantage of a parser for the resource-...

متن کامل

Transferring Semantic Roles Using Translation and Syntactic Information

Our paper addresses the problem of annotation projection for semantic role labeling for resource-poor languages using supervised annotations from a resource-rich language through parallel data. We propose a transfer method that employs information from source and target syntactic dependencies as well as word alignment density to improve the quality of an iterative bootstrapping method. Our expe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2005